Run Compressed Rank/Select for Large Alphabets

نویسندگان

  • José Fuentes Sepúlveda
  • Juha Kärkkäinen
  • Dmitry Kosolobov
  • Simon J. Puglisi
چکیده

Given a string of length n that is composed of r runs of letters from the alphabet {0, 1, . . . , σ−1} such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/ log n, stores the string in r log nσ r + o(r log nσ r ) bits and supports select and access queries in O(log log(n/r) log logn ) time and rank queries in O(log log(nσ/r) log logn ) time. We show that r log n(σ−1) r bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1+ ǫ)r log nσ r +O(r) bits, where ǫ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/ log n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2log δ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31–46% more space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Rank/Select Queries over Arbitrary Sequences

We present a practical study on the compact representation of sequences supporting rank, select, and access queries. While there are several theoretical solutions to the problem, only a few have been tried out, and there is little idea on how the others would perform, especially in the case of sequences with very large alphabets. We first present a new practical implementation of the compressed...

متن کامل

Alphabet Partitioning for Compressed Rank/Select with Applications

We show show how, if we have a data structure that efficiently supports access, rank and select queries on strings in compressed form, and another that supports those queries efficiently on strings over large alphabets, we can combine their strengths via alphabet partitioning. Specifically, we present a data structure that stores a string s[1..n] over alphabet [1..σ] in nH0(s)+o(n)(H0(s)+1) bit...

متن کامل

CSA++: Fast Pattern Search for Large Alphabets

Indexed pattern search in text has been studied for many decades. For small alphabets, the FM-Index provides unmatched performance, in terms of both space required and search speed. For large alphabets – for example, when the tokens are words – the situation is more complex, and FM-Index representations are compact, but potentially slow. In this paper we apply recent innovations from the field ...

متن کامل

Rank and select: Another lesson learned

Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet in 1980s, these primitives have also been a subject of vivid research concerning their practical incarnations in the last decade. We present a few novel rank...

متن کامل

Grammar Compressed Sequences with Rank/Select Support

Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. In several recent applications, the need to represent highly repetitive sequences arises, where statistical compression is ineffective. We introduce grammar-based representations for repetitive sequences, which use up ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.02910  شماره 

صفحات  -

تاریخ انتشار 2017